Data Modeling

JavaND#305 C03 L03 A05 Data Modeling

00:00

Quiz

SOLUTION:

Embedded data model is preferred for “contains” relationship between entities.
A write operation is atomic for a single document including the embedded documents.
For document references, the application needs to maintain the integrity of the relationship.

Patterns

One-to-One with Embedded documents

Consider the following example that maps patient and address relationships. The example illustrates the advantage of embedding over referencing if you need to view one data entity in the context of the other. In this one-to-one relationship between patient and address data, the address belongs to the patient.
In the normalized data model, the address document contains a reference to the patron document.

{
   _id: "lakshmi",
   name: "Lakshmi Natarajan"
}

{
   patient_id: "lakshmi",
   street: "149 Main St",
   city: "Birmingham",
   state: "AL"
}

If the address data is frequently retrieved with the patient information, then with referencing, your application needs to issue multiple queries to resolve the reference. The better data model would be to embed the address data in the patient data, like this,

{
   _id: "lakshmi",
   name: "Lakshmi Natarajan",
   address: {
                 street: "149 Main St",
                 city: "Birmingham",
                 state: "AL"
           }
}

With the embedded data model, your application can retrieve the complete patient information with one query.

One-to-Many with Embedded documents

Consider the following example that maps patient and multiple addresses. The example illustrates the advantage of embedding over referencing if you need to view many data entities in the context of another. In this one-to-many relationship between patient and address data, the patient has multiple address entities.

In the normalized data model, the address documents contain a reference to the patient document.

{
   _id: "lakshmi",
   name: "Lakshmi Natarajan"
}

{
   patient_id: "lakshmi",
   street: "149 Main St",
   city: "Birmingham",
   state: "AL"
}

{
      patient_id: "lakshmi",
      street: "298 Second St",
      city: "Birmingham",
      state: "AL"
}

If your application frequently retrieves the address data with the patient information, then your application needs to issue as many queries as the number of linked addresses to resolve the references. A more optimal schema would be to embed the address data entities in the patient data, like this:

{
   _id: "lakshmi",
   name: "Lakshmi Natarajan"
   addresses: [{
   street: "149 Main St",
   city: "Birmingham",
   state: "AL"
},
{
      street: "298 Second St",
      city: "Birmingham",
      state: "AL"
}]
 }

One-to-Many with Document references

Consider this example that maps movie and artist relationships. The example illustrates the advantage of referencing over embedding to avoid repetition of the artist information.

Embedding the artist document inside the movie document would lead to repetition of the artist data, as the following documents show,

{
    "_id": "12345"
    "title": "Seven"
    "cast": [{
        "name": "Brad Pitt",
        "gender": "male",
        "age": 51
    },
    {
        "name": "Morgan Freeman",
        "gender": "male",
        "age": 70
    },
    {
        "name": "Gwyneth Paltrow",
        "gender": "female",
        "age": 46
    }]
}

{
    "_id": "12346"
    "title": "Fight Club"
    "cast": [{
        "name": "Brad Pitt",
        "gender": "male",
        "age": 51
    }]
}

Artist information is repeated across multiple movie documents. This leads to repetition of data and also makes it harder to update the details of the artist document. In order to update the age field of an actor, all the movie documents containing the artist document need to be updated.

To avoid repetition of the artist data, use references and keep the artist information in a separate collection from the movie collection.

Artist Collection

{
        "_id": "abc123",
        "name": "Brad Pitt",
        "gender": "male",
        "age": 51
    }

    {
        "_id": "abc124",
        "name": "Morgan Freeman",
        "gender": "male",
        "age": 70
    }

    {
        "_id": "abc125",
        "name": "Gwyneth Paltrow",
        "gender": "female",
        "age": 46
    }

Movie Collection

{
    "_id": "12345"
    "title": "Seven"
    "cast": ["abc123", "abc124", "abc125"]
}
{
    "_id": "12346"
    "title": "Fight Club"
    "cast": ["abc123"]
}

When using references, the growth of the relationships determine where to store the reference. The number of artists in a movie doesn’t grow but an artist can keep acting in more movies. So in this case, storing the reference in the movie makes sense.

Quiz

SOLUTION:

Document references are useful when duplication of data needs to be avoided.

Quiz

SOLUTION:

In one-to-many relationships, where the “many” documents are viewed in the context of the “one” or documents.

Exercise

Task Description:

Use MongoDB to store the data for a blog. These are the primary entities in a blog,

Post - The actual blog post which has a title, text and an author.
Comment - A comment written by a visitor for a particular post. A comment has the name of the visitor and text of the comment.
Author - A person who writes posts. Author has first_name, last_name and email.

Task Feedback:

Nice. You dabbled in data modeling. Data modeling takes practice, think of some other scenarios and write down how you would model them.